Shading topography imaging for robotic unloading

ABSTRACT

Vision systems for robotic assemblies for handling cargo, for example, unloading cargo from a trailer, can determine the position of cargo based on shading topography. Shading topography imaging can be performed by using light sources arranged at different positions relative to the image capture device(s).

CROSS-REFERENCE

This application claims the benefit of priority to U.S. ProvisionalApplication No. 62/860,307, filed on Jun. 12, 2019, entitled “ShadingTopography For Robotic Unloading,” the contents of which are herebyincorporated by reference in their entirety, including but not limitedto those portions concerning devices, systems, and methods related tovision systems and/or robotic unloading.

FIELD

The present disclosure relates to vision systems, and particularly tovision systems for robotics.

SUMMARY

According to one aspect of the present disclosure, a robotic assemblyfor handling cargo may comprise at least one robotic appendage forhandling cargo; and a vision system for acquiring information to assistoperation of the at least one robotic appendage. The vision system maycomprise at least two light sources arranged spaced apart from eachother; at least one camera configured to capture images of cargo forhandling by the robotic assembly; and a vision control system fordetermining position of cargo. The vision control system may include atleast one processor for executing instructions stored on a memory todetermine position of cargo based on a shading topography imagecomprising a blended composite of information of a plurality of imagescaptured by the at least one camera.

In some embodiments, the shading topography image may comprise aconvolutional shading topography image comprising a convolution of theplurality of images captured by the at least one camera. The convolutionof the plurality of images may comprise convolution of predefinedchannels of images captured by the at least one camera.

In some embodiments, one of the predefined channels may comprise imagedata captured under illumination by at least one, but fewer than all, ofthe at least two light sources, and other image data captured withillumination by at least another one, but fewer than all, of the atleast two light sources. The image data captured under illumination byat least one, but fewer than all, of the at least two light sources maycomprise image data captured under illumination by one of the at leasttwo light sources having a first upper directional lighting trajectory.The other image data captured with illumination by at least another one,but fewer than all, of the at least two light sources may comprise imagedata captured under illumination by another one of the at least twolight sources having a first lower directional light trajectory.

In some embodiments, another one of the predefined channels may compriseimage data captured under illumination by at least one, but fewer thanall, of the at least two light sources having a second upper directionallight trajectory, different from the first upper directional lighttrajectory. The another one of the predefined channels may compriseother image data captured with illumination by at least another one, butfewer than all, of the at least two light sources having a second lowerdirectional light trajectory, different from the first lower directionallight trajectory. In some embodiments, at least one of the predefinedchannels may comprise image data captured with illumination by greaterthan one of the at least two light sources. The image data captured withillumination by greater than one of the at least two light sources mayinclude image data captured under illumination by first and seconddirectional light trajectories for each of the at least two lightsources.

In some embodiments, the at least one camera may be arranged between twoof the at least two light sources. The at least one camera may beconfigured to capture at least one image of cargo under illumination byat least one of the at least two light sources and at least anotherimage of cargo under illumination by another one of the at least twolight sources. In some embodiments, configuration of the at least onecamera to capture the at least one image may include configuration tocapture the at least one image under illumination by at least one of theat least two light sources without illumination by another of the atleast two light sources. Configuration of the at least one camera tocapture the at least another image may include configuration to capturethe at least one another image under illumination by at least theanother of the at least two light sources without illumination by atleast one of the at least one of the at least two light sources.

In some embodiments, the at least one camera may be coupled with arobotic unloading machine comprising the at least one robotic appendage.At least one of the at least two light sources may be coupled with therobotic unloading machine. At least one of the at least two lightsources may include a first light having a first directional lightingtrajectory and a second light having a second light trajectory.

In some embodiments, the vision control system may be adapted to conductan imaging sequence including communicating with the at least one camerato capture one or more images of a wall of cargo under a predeterminedillumination scheme of the at least two light sources. The predeterminedillumination scheme may include one or more images having none of the atleast two light sources illuminated, one or more images having all ofthe at least two light sources illuminated, and/or one or more imageshaving fewer than all of the at least two light sources illuminated.

In some embodiments, the one or more images having fewer than all of theat least two light sources illuminated may include at least one imageunder illumination by only a first light of one of the at least twolight sources having a first directional lighting trajectory. The one ormore images having fewer than all of the at least two light sourcesilluminated may include at least one image under illumination by only asecond light of the one of the at least two light sources having asecond directional lighting trajectory, different from the firstdirectional lighting trajectory. In some embodiments, the shadingtopography image may comprise an expression of absolute value of agradient sum of intensity values of a number of images of cargo acquiredby the at least one camera. In some embodiments, the at least two lightsources may include two light sources. The two light sources may be apair of light sources. The two light sources may each have at least twodistinct lenses for applying different light trajectories.

According to another aspect of the presented disclosure, a vision systemof a robotic assembly for handling cargo may comprise at least two lightsources arranged spaced apart from each other; at least one cameraconfigured to capture images of cargo for handling by the roboticassembly; and a vision control system for determining position of cargo.The vision control system may include at least one processor forexecuting instructions stored on a memory to determine position of cargobased on a shading topography image comprising a blended composite ofinformation of a plurality of images captured by the at least onecamera. In some embodiments, the at least two light sources may includetwo light sources. The two light sources may be a pair of light sources.The two light sources may each have at least two distinct lenses forapplying different light trajectories. In some embodiments, the at leastone camera is arranged between two of the at least two light sources.

According to another aspect of the present disclosure, a vision systemof a robotic assembly for handling cargo may include at least two lightsources arranged spaced apart from each other, at least one cameraconfigured to capture images of cargo for handling by the roboticassembly; and a vision control system for determining position orlocation of cargo. The vision control system may include at least oneprocessor for executing instructions stored on a memory to determineposition of cargo based on a shading topography image captured by the atleast one camera.

In some embodiments, the shading topography image may include anexpression of absolute value of a gradient sum of intensity values of anumber of images of cargo acquired by the at least one camera. The atleast one camera may be arranged between two of the at least two lightsources. The at least one camera may be configured to capture at leastone image of cargo under illumination by one of the at least two lightsources and at least another image of cargo under illumination byanother of the at least two light sources.

In some embodiments, configuration to capture at least one image ofcargo under illumination by each of the one and the another of the atleast two light sources may include illumination by the one of the atleast two light sources without illumination by the another of the atleast two light sources. Configuration to capture at least one image ofcargo under illumination by each of the one and the another of the atleast two light sources may include illumination by the another of theat least two light sources without illumination by the one of the lightsources. In some embodiments, the vision system may be configured todetermine position of cargo based on a combination of the shadingtopography image and a stereo depth image.

According to another aspect of the present disclosure, a roboticassembly for handling cargo may include at least one robotic appendagefor handling cargo; and a vision system for acquiring information toassist operation of the at least one robotic appendage. The visionsystem may include a at least two light sources arranged spaced apartfrom each other, at least one camera configured to capture images ofcargo for handling by the at least one robotic appendage; and a visioncontrol system for determining position of cargo. The vision controlsystem may include at least one processor for executing instructionsstored on a memory to determine position of cargo based on a shadingtopography image captured by the at least one camera.

In some embodiments, the shading topography image may include anexpression of absolute value of a gradient sum of intensity values of anumber of images of cargo acquired by the at least one camera. The atleast one camera may be arranged between the at least two light sources.The at least one camera may be coupled with the robotic appendage.

In some embodiments, the at least one camera may be configured tocapture at least one image of cargo under illumination by one of the atleast two light sources and/or at least another image of cargo underillumination by another of the at least two light sources.

In some embodiments, configuration to capture at least one image ofcargo under illumination by each of the one and the another of the atleast two light sources may include illumination by the one of the atleast two light sources without illumination by the another of the atleast two light sources, and/or illumination by the another of the atleast two light sources without illumination by the one of the lightsources. The vision control system may be configured to determineposition of cargo based on a combination of the shading topography imageand a stereo depth image.

Additional features of the present disclosure will become apparent tothose skilled in the art upon consideration of illustrative embodimentsexemplifying the best mode of carrying out the disclosure as presentlyperceived.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings) will be provided by the Office upon request and paymentof the necessary fee.

The concepts described in the present disclosure are illustrated by wayof example and not by way of limitation in the accompanying figures. Forsimplicity and clarity of illustration, elements illustrated in thefigures are not necessarily drawn to scale. For example, the dimensionsof some elements may be exaggerated relative to other elements forclarity. Further, where considered appropriate, reference labels havebeen repeated among the figures to indicate corresponding or analogouselements. The patent or application file contains at least one drawingexecuted in color. Copies of this patent or patent applicationpublication with color drawing(s) will be provided by the Office uponrequest and payment of the necessary fee.

The detailed description particularly refers to the accompanying figuresin which:

FIG. 1 is an elevation view of a cargo truck having a cargo trailerrendered transparent to reveal a robotic assembly loaded thereon forhandling (loading and/or unloading) cargo, and showing that the roboticassembly includes a shading topography vision system for managing visualinformation of the cargo to assist in handling the cargo, and showingthat the cargo includes a number of cartons stacked to present a cartonwall facing the robotic assembly;

FIG. 2 is a diagrammatic view of indicating a principle of the shadingtopography vision system of the robotic assembly of FIG. 1, and showingan image intensity plot indicating aspects of shading topography;

FIG. 3 is another diagrammatic view of indicating a principle of theshading topography vision system of the robotic assembly of FIG. 1, andshowing an image intensity plot indicating aspects of shadingtopography;

FIG. 4 is another diagrammatic view of indicating a principle of theshading topography vision system of the robotic assembly of FIG. 1, andshowing an image intensity plot indicating aspects of shadingtopography;

FIG. 5 is a diagrammatic view of the image intensity plots of FIGS. 3and 4, and an combine gradient sum of the image intensity plots of FIGS.2 and 3, indicating aspects of shading topography;

FIG. 6 is a diagrammatic view of the robotic assembly of FIG. 1 showingexemplary positions of cameras and lights;

FIG. 7 is a flow diagram of a process for determining position orlocation of cargo;

FIG. 8 is a diagrammatic view of the robotic assembly of FIG. 1 showingan exemplary manner of alignment;

FIG. 9 is a diagrammatic view of the robotic assembly of FIG. 1 showingan exemplary manner of alignment;

FIG. 10 is a diagrammatic view of the robotic assembly of FIG. 1addressing a carton wall;

FIG. 11 is a depiction of an image captured of a carton wall underillumination by one light of a top light source (left) and a depictionof an image captured of the carton wall under illumination by one lightof a bottom light source (right);

FIG. 12 is a depiction of an image captured of a carton wall underillumination by another light of a top light source (left) and adepiction of an image captured of the carton wall under illumination byanother light of a bottom light source (right);

FIG. 13A is a color depiction of a shading topography image of thecarton wall of FIGS. 11 and 12;

FIG. 13B is a color depiction of a stereo binocular image of the cartonwall of FIGS. 11 and 12;

FIG. 14A is the same depiction of the carton wall of FIG. 13A renderedin black in white;

FIG. 14B is the same depiction of the carton wall of FIG. 13B renderedin black in white;

FIG. 15 is another diagrammatic view of indicating a principle of theshading topography vision system of the robotic assembly of FIG. 1, andshowing an image intensity plot indicating aspects of shadingtopography; and

FIG. 16 is another diagrammatic view of indicating a principle of theshading topography vision system of the robotic assembly of FIG. 1, andshowing an image intensity plot indicating aspects of shadingtopography; and

FIG. 17 is a diagrammatic view of the robotic assembly of FIG. 1including a neural network.

DETAILED DESCRIPTION

While the concepts of the present disclosure are susceptible to variousmodifications and alternative forms, specific embodiments thereof havebeen shown by way of example in the drawings and will herein bedescribed in detail. It should be understood, however, that there is nointent to limit the concepts of the present disclosure to the particularforms disclosed, but on the contrary, the intention is to cover allmodifications, equivalents, and alternatives consistent with the presentdisclosure and the appended claims.

Referring to FIG. 1, a robotic assembly 110 is shown addressing a loadof cargo 112. The load of cargo 112 is illustratively arranged within acargo trailer 114 of a cargo truck. The cargo 112 includes a number ofcartons 116 stacked to form carton walls 118, 120. The carton wall 118presents a face 122 comprising the faces 124 of each of the cartons 116of the carton wall 118. The carton wall 118 illustratively includes fivecartons 116 stacked vertically and forming a single column, and mayinclude any suitable number of columns extending into the page, acrossthe width of the trailer 114. In some instances, cartons 116 of onecolumn may intermingle with other columns, depending on size andposition within the carton wall 118. Each of the five cartons 116illustratively forms a portion of a corresponding row of cartons 116 ofthe carton wall 118, which collectively make up the carton wall 118. Anexample of a carton wall can be observed in FIGS. 11 and 12.

In handling the cartons 116, for example, to perform (partly or fully)autonomous robotic unloading of the trailer 114, the robotic assembly110 may determine a location of the individual cartons 116 to assist inpicking (grabbing and/or lifting the carton) safely and/or effectively.The robotic assembly 110 illustratively includes a vision system 126 fordetermining carton location or position. The vision system 126 isillustratively embodied as a shading topography vision system which candevelop a shading topography image for use in efficiently detectingcarton position or location.

Computerized vision approaches (vision systems) and Light Detection andRanging (Lidar) can combine to produce detailed 3-dimensional (3D) viewsin conventional robotic guidance systems. Vision systems can rely onambient lighting and/or use integrated lighting to capture or form animage. Infrared and/or near-infrared lighting can be used by visionsystems, for example, to assist in avoiding and/or filtering outinterference from visible lights, such as ambient and/or uncontrolledbackground light. 3D imaging may be applied as a type of vision systemused in robotic guidance. Depth information may be obtained fromtime-of-flight, Lidar or stereo imaging approaches in vision systems.Time-of-flight vision systems can use light pulses to determine thedistance of objects, and can be limited to depth maps with about 1 cm(0.4 inch) resolution depending on the sophistication of the electronicsand/or techniques. Binocular stereo can use two cameras arranged tocreate shift in the image (disparity) that can be related to the depthat that point. Small and/or abrupt depth transitions may not bedetectable with this technique.

Vision guidance for robotic picking can use 3D imaging with anycombination of the techniques mentioned above to detect the pickcoordinates for navigation to by a robot. For many applications wherethe depth variation may be expected to be within the capabilities ofthese techniques, the approach can be successful, for example, asmentioned within U.S. Pat. No. 7,313,464 B1, issued on Dec. 25, 2007,the contents of which are hereby incorporated by reference in theirentirety, including but without limitation, those portions related torobotic operations. Vision systems for robotic picking can use colorand/or depth information to determine boundaries of one or more of theobjects to be picked. Vision systems may be used to guide robots to pickfrom stacks of cartons on a floor. Conventional approaches to visionsystems may focused on picking of cartons with large differences indepth and/or color that may allow the location of the cartons to bedetermined using this information.

Walls of cartons, for example the carton wall 118, within a trailer canpose unique challenges for vision systems. For example, a trailer may beabout 98.4 inches by about 98.4 inches (2500 mm by 2500 mm) wide bytall, while the interface between the cartons of its cargo can be merelyfractions of an inch (few millimeters) wide and/or deep. Cartons mayhave no markings and/or may be covered in artwork that can differbetween any two cartons. The cartons may be densely packed to fullyutilize the space in the trailer. The cartons may be filled with productthat can be heavy and cause the interface between cartons to becompressed. Conventional methods may not present solutions includingvision systems capable of determining the location of some or all of thecartons stacked in a wall within a trailer without significant depthand/or visual variation between adjacent cartons.

The topography of the surface (face) of a wall of cartons stacked insidea trailer can assist in robotic picking of the cartons. The location ofthe interfaces between cartons can be combined with the depths betweenthe cartons to generate pick coordinates for a robot in handling thecartons.

For example, an industrial robot may be a six-axis robot for handlingthe weight of cartons filled with product, such as to move them at highspeed to another location. An exemplary maximum reach of such a robot isabout 78.8 inches (2 m). For a vision system to be used to guide therobot, it generally may be mounted on the robot. Since the robot mayneed to be at the picking location relative to each wall, the lightingof the vision system may be required to move with the robot. In someembodiments, the vision system 126, cameras 130, and/or light sources150, may be partly or wholly mounted apart from the robotic assembly110, for example, one light source 150 may be mounted apart from therobotic assembly 110 while another is mounted on the robotic assembly150.

Within the present disclosure, a vision system may apply two lightsources arranged at different positions so that the vision system candetect the precise location of interfaces between cartons. The cartonsmay include arbitrary and/or unknown artwork on their sides. Thelocation of the interfaces may be communicated to the robot controlsystem in its coordinate space, for example, after an alignmentprocedure between the robot and the vision system. Unlike PhotometricStereo approaches, calibration at each instance of the robot addressingcargo having different environmental lighting aspects can be avoidedand/or may not be required to detect the interfaces. Examples ofsuitable robotic cargo handling devices, systems, and/or methods may befound within U.S. Patent Application Publication Nos. 2015/0352721 A1,filed on Jun. 4, 2015, and 2016/0068357 A1, filed on Sep. 5, 2014, thecontents of each of which are hereby incorporated by reference in theirentireties, including but without limitation, those portions related torobotic operations.

Shape-from-shading can involve estimation of surface topography in asingle image with a single known light source. Since the intensity ofthe reflected light at a point in the image can depend on itsreflectance (bright to dark) and its orientation with respect to thelight and the observer (camera), the depth of the point may not be ableto be unambiguously computed from one image and one light for thegeneral case. For example, a point on a surface can appear to be darkbecause of its color (reflectance), perhaps, because it is orientedleaning away from the camera. To overcome this issue, Photometric Stereo(PS) can use at least three (3) lights oriented orthogonal to eachother, and three (3) images, to estimate both the depth and thereflectance at a point. Robotic picking applications can be constrainedby the placement of the lights and/or cameras and may not allow theplacement of the three lights to illuminate the entire region that therobot needs to be guided. PS can require a calibration with objects ofknown shape and reflection, on each instance of addressing a target,such as cargo. Since the position, angle and/or intensity of the lightsaffect the reflected light intensity in addition to the localreflectance and the surface shape (surface normal), PS can require acalibration to create a model of the known lighting variation within thefield of view for all of the lights and/or camera positions. This mayneed to be recomputed if a light changes in intensity, position, ororientation over time, for example, due to thermal changes, lightsand/or lenses becoming soiled, and/or generally as they age over time.Recalibration may be required for any change of camera position,orientation, or optics. Specular surfaces within the field of view mayneed additional constraints and/or calibration. Further the walls of atrailer are typically metal which can result in various specularreflections.

Topography can include the map of a terrain or any surface illustratingthe topographical features of the terrain or surface. Devices, systems,and methods within the present disclosure may use lights with preferredlensing placed at desired locations to produce shading over the surfaceof a wall of cartons.

As shown in FIG. 2, lights sources 213, 215 can be arranged toilluminate a wall of cartons 212, for example, from the top and bottom,in a cargo arrangement such as within a trailer. An image of the cartonwall 212 can be captured by the camera 211 which can be arrangedcentrally in the trailer. A single camera 211 is shown for descriptivepurpose, yet multiple cameras 211 may be applied. The carton wall 212 isanalogous to the carton wall 118 for descriptive purposes in disclosingthe application of shading topography imaging.

The top and bottom lights sources 213, 215 may be turned on(illuminated) sequentially and/or in series to assist in capturingimages. Each of the top and bottom light sources 213, 215 may havemultiple lights, and the different lights of each light source 213, 215may have a different lens 214, 216. The lights of each light source 213,215 may be formed as one or more light emitting diodes (LEDs). The LEDsmay be strobed with emission at 850 nm wavelength. The light sources213, 215 may each be individually synchronized to the camera's exposureactive signal.

One or more cameras 211 may each be embodied as a CMOS sensor. Thecamera(s) 211 may have high response at the 850 nm wavelength and mayhave large pixel size (>4 microns) to reduce the amount of lightingrequired to sufficiently illuminate the entire wall of cartons 212.Camera exposure may be varied for each acquired image, and a sequence ofexposures can be used to acquire high dynamic range images. High dynamicrange images can include, for example, an image formed as a synthesis ofone or more acquired images having different exposures in which aselected pixel value for each pixel of the high dynamic image isselected as the greatest pixel value for that pixel among the acquiredimages, and if the selected pixel value is saturated (for example, 255in a 8 bit image) then the saturated value can be replaced by the nextlower pixel value for that pixel among the acquired images. Thecamera(s) 211 may be mounted on the robot frame, as a portion of therobotic assembly 110, and may therefore be arranged to be within about60 inches (1500 mm) of the wall, for typical industrial 6-axis robots.

A lens of the camera(s) 211 for imaging the carton wall 212 mayillustratively include a lens focal length of about 0.12 inches (3 mm)which can provide a wide angle of acceptance and a resulting field ofview of about 98×about 98 inches (2500×2500 mm). With a CMOS sensor, forexample, Sony's IMX sensors with 2000×2000 pixels, the vision system mayhave a resolution of about 0.04 inches (1 mm) per pixel. The wide angleof acceptance of the camera lens can require that the image be correctedfor lens distortion to provide accurate pick coordinates for the robot.A narrow bandpass filter of about 850 nm can be placed between the lensand the camera. The wide acceptance angle of the lens can be assisted bythis arrangement to reduce the amount of stray light that can enter thecamera(s). Each camera 211 may be arranged in communication with acomputer, for example, via a Gigabit Ethernet cable. The camera(s) 211may be powered over the same cable with a Power over Ethernet card inthe computer. In some embodiments, the camera may communicate wirelesslywith the computer. The computer may be configured to turn off thecamera(s) when the camera(s) is/are not in use. A high-flex cable withover 10 million cycles of flex and a protective sheath may be used toavoid damage to the cable while in operation. Active cooling of thecameras can be applied, for example, by the use of a copper heatsink anda fan.

The camera(s) 211 may capture images of the carton wall 212 underdifferent illumination schemes. For example, the camera(s) 211 maycapture an image 217 under illumination by only the light source 213 asan upper light source, and/or the camera(s) 211 may capture an image 218under illumination by only the light source 215 as an lower lightsource.

In one example of operation, six images may be obtained with each of twocameras 211. The sequencing of each image capture can be performed by aspecific hardware pulse generator that turns on each light and camera insynch as appropriate for the sequence. This can allow for all of theimages to be captured by the cameras within a brief timeframe, forexample, about one second. The cameras may be capable of 30 frames persecond. However, some of the exposure times may be longer than othersand may add up to one second. In no particular order, among theexemplary six photo images (image sets) may include one image for eachof the following characteristics:

1. No lights—8 bit image

2. Lights for binocular stereo—8 bit image

3. Top light angle 1—12 bit image

4. Top light angle 2—12 bit image

5. Bottom light angle 1—12 bit image

6. Bottom light angle 2—12 bit image

FIGS. 3 and 4 illustrate the principle of shading topography in twodimensions between adjacent cartons A, B having their upper surfaces A₁,B₁ parallel and in line with no difference in depth. In FIG. 3, thecamera axis 328 splits the viewing plane into two half planes, one halfplane to the right, and one half plane to the left of the axis 328. Thelight source 321 is illustratively positioned in the same half plane asthe interface between the cartons 329. For cartons manufactured withcardboards, plastics, and other non-metallic materials the surface canbe diffuse and may scatter the light in many directions, as suggested bynumeral 324. The scattered light intensity can be estimated by using amodel such as the Lambertian Bidirectional reflectance DistributionFunction (BRDF).

For descriptive purposes, in the lower region of FIG. 3, the intensityof the scattered light is plotted by line 325 depending on the incidentlight intensity, which follows a generally inverse square relationshipto the distance from the light source 321. The scattered light intensity325 may also depend linearly on the cosine of the angle 323 between thesurface normal 322 and the incident ray from the light source 321. Atthe location of the carton interface 329 the angle of incidence 321 isnearly 90 degrees and so the light intensity falls sharply asillustrated at numeral 3211. As the surface normal changes back to theorientation in 322 the intensity line 325 returns to the trend of line325.

The finite extent of the drop in intensity is suggested in numeral 3212and is illustratively larger than the gap 3213 between the cartons A, B.This can result in an effective magnification of the interface regionwhich can be an advantage of the Shading Topography Image (STI)technique. The intensity drop signals, which are interpreted to applythe STI technique, do not rely on a difference in depth between thecartons A, B (from the camera and the light sources). Independence fromthe depth of cartons can be an advantage over any depth-based systems indetecting the interface between the cartons A, B, for example, in realworld scenarios where a difference in depth may not exist. Even withzero gap between the cartons A, B, and/or with zero difference in depthbetween the cartons the shading region is finite. Accordingly, STItechniques within the present disclosure can efficiently address suchlow differential instances.

The scattered intensity can also depend on the reflectance of thesurface (A₁, B₁), For example, the area 326 is depicted as an area ofthe carton B where the reflectance of the surface B₁ is lower than therest of the carton B. The lower reflectance at the area 326 is observedon the line 325 at numeral 327 which shows a drop in the observedintensity due to the lower reflectance of the area 326. The area 326 mayrepresent a graphic on the carton, and/or other disturbance inreflectance.

Referring now to FIG. 4, the camera axis 438 splits the plane into twohalf planes. In this case the light source 431 is in the other halfplane (on the right side of the camera axis 438 in the orientation asshown in FIG. 4) compared with the arrangement in FIG. 3, and to theright of the interface 437 between the cartons A, B. This arrangementchanges the geometry of the light scattering from the interface 437between the cartons A, B which is indicated by a sharp drop 4310 in theimage intensity plot 436. Notably, as the light source 431 is orientedin the other half relative to that in FIG. 3, the intensity drop 4310 inFIG. 4 is more pronounced on the right side of the plot 436, while theintensity drop 3211 in FIG. 3 is more pronounced on the left side of theplot 325. The intensity of each drop 3211, 436 climbs back to the trendin approaching the camera position. The area 438 a represents an area oflow reflectance similar to area 326 in FIG. 3.

Referring now to FIG. 5, the intensity plots of the images acquiredunder the lights sources 321, 431 from the examples of FIGS. 3 and 4 areshown, which are illustratively embodied as upper 321 and lower 431lights. The dotted line 542 represents the location of the interfacebetween the cartons A, B. Lines 541 and 543 correspond to the extent ofthe shading perceived by the camera of numerals 4310 and 3211. Theabsolute value of the gradient of the sum of the individual intensityplots 325, 436, as plotted in 546, shows a distinct stepwise increase inthe intensity at the position of the lines 541, 542, 543, which providesa clear signal that can be used to identify the location of theinterface of the cartons A, B. Note that the change in intensity due toartwork 544, 545 is not highly perceivable in the gradient sum plot,because the position of the artwork (i.e., 326, 438 a, representedrespectively at 544 and 545) is the same in both the top and bottom litimage intensities. The STI signal contrast is dependent on the cosine ofthe angle of the source light to the surface normal, thus, at normalincidence the signal can be the largest.

The process discussed above can be applied to locate the horizontaland/or vertical interfaces of the cartons A, B. The STI image can begenerated with just two light sources (bars), which can be positionedclose to the top and bottom of the carton wall. In the illustrativeembodiment, the lights can between about 35 inches (900 mm) to about 71inches (1800 mm) long. They can incorporate multiple LED's which may bespaced evenly from each other for a given light source. The lights ofeach light source may be lensed individually and/or may be controlled insegments of about 12 inches (300 mm) lengths arranged laterally withrespect to the carton wall. By acquiring images with individual segmentsof lights, the vertical interfaces of the cartons can be highlighted inthe STI image. To locate the horizontal interfaces, all of the segmentsof lights can turned on during image acquisition.

Although labeled “top” and “bottom” the lights can be interchanged andcan be placed in a range of locations in each half plane. For thecreation of a Shading Topography Image (STI) the lights should be indifferent half planes, defined by the camera axis, and should beoriented parallel to the camera axis or towards the camera axis. For agiven size of a wall of cartons and the possible locations that a cameracan be placed centrally on a robot, the lights should be placed atpreferred locations for optimal performance of STI. For example, for a6-axis robot, the preferred location for placement of the camera(s) forSTI can be on Joint 3 (J3—653) of the robot as shown in FIG. 6. In theillustrative embodiment, the cameras 652 a, 652 b are illustrativelyarranged equally spaced apart from a center line 659 by about 3 inches,although in some embodiments, each camera 652 a, 652 b may be spacedfrom the centerline 659 by any suitable amount, for example but withoutlimitation from about 0 to about 6 inches. The cameras 652 a, 652 b areillustratively arranged centrally in the width and height of thetrailer, but in some embodiments, may be off-center with suitable datacorrection. In the illustrative embodiments, positioning the camera forimaging may include the angles of the joints as shown in FIG. 6,including a level horizontal arrangement of the Joint 3 (J3—653)positioning the cameras 652 a, 652 b horizontally level relative to eachother. This arrangement can allow the camera to view the entire cartonwall without occlusion. The camera axis 659 divides the plane of thecarton wall into the top and bottom portions and the lights for creatingthe STI are arranged in the top and bottom half planes.

The distance between the carton wall and the camera may be optimallydetermined in consideration of the following. In some embodiments, thelight bars 651, 658 may be optimally mounted as far away as each otheras possible, within practical constraints. For example, the light bars651, 658 may be arranged with one close to the ceiling of the trailerand with one close to the floor of the trailer. FIG. 6 shows the bottomlight bar 658 mounted close to the floor of the trailer and on the frontsurface of a base of the robot. The top light bar 651 is illustrativelyplaced close to the ceiling 656 of the trailer near the end effector6510 of the robot, which may be embodied as an end-of-arm tool.

In such embodiments, the placement of the top light bar 651 can beconstrained by the requirement that it does not interfere with theprimary function of the end-of-arm tool which is to pick the cartonssecurely for placement onto conveyors or other transport. With the topand bottom locations of the light bars 651, 658, the intensity of thelighting on the carton wall can depend on the distance between thecarton wall and the light bars 651, 658, and can depend on the lenses ofeach light of each light bar (e.g., FIG. 2, lenses 214, 216). The lensof the light source can determine the spread of the light from the lightsource. The light bars 651, 658 and/or lenses can be oriented to directtheir light towards the horizontal plane at the camera axis.

In the illustrative embodiment, each light bar 651, 658 includes twosets of LED's strips, one with 120 degree lens and the other with 30degree lens, which are combined into a single light source so that theycan illuminate different sections of the carton wall efficiently. The120 degree lens LEDs of each light bar 651, 658 are illustrativelypointed nominally perpendicular to the carton wall so that their highestintensity on the carton wall is close to the section of the wall nearthe light bar 651, 658. The 30 degree LEDs of each light bar 651, 658are illustratively pointed so that their highest intensity on the cartonwall is closer to the opposite end (far end) of the carton wall (forexample, at 60 degrees from horizontal, as shown in FIG. 10). The use oftwo distinct angles of illumination can provide a more complete spreadof light for each light source 651, 658 for image capture. Capturedimages can be processed to determine the characteristics of the lightintensity.

Image processing can performed by summing the images from a set ofimages generated using the spaced apart (opposing) light sources, e.g.,light bars 651, 658. The gradient of the images can be obtained byconvolution with a kernel. A directional gradient is obtained by using akernel designed to detect gradients in one direction, such as indicatedby the x- and y-Sobel gradients below:

Once the Shading Topography (ST) image is computed, the ST image maycontain noise, for example, caused by deformations on the face of thecartons, manufactured holes, tears and/or shadows due to depth variationof the cartons in addition to the desired signal from the interfacebetween cartons. An algorithm can be applied to find straight linesegments within the ST image data. For example, the Hough transform canbe applied to detect the valid boundaries between the adjacent cartons.The algorithm can be used with a scoring metric (e.g., a count ofdiscernable lines found within a predefined window area within theimage) such that the long continuous edges between cartons receivehigher weighting than the noise created by other features.

The ST image can show an extended region of signal where an interfacebetween the cartons is present. This can be advantageous because it canmake the interface easier to detect. The precise location of theinterface within this region can be found by computing the midpoint ofthe shading region.

Inside a trailer in a warehouse, the background ambient light at 850 nmcan be considered quiet low. However, some trailers may containportholes that can increase the intensity of background lightsignificantly. A separate image can be acquired with all the lights ofthe vision system turned off to detect the presence of any backgroundambient light. This image can be also used to detect the presence of aLidar at 850 nm used to position the robot. These signals can besubtracted from the images used for STI.

The construction of a trailer itself can affect the lighting on cartons.For example, the ceiling can pose a particular challenge if it has abare metal surface. Trailers with a wooden ceiling may not adverselyaffect the lighting of the carton wall. However, a metal ceiling canreflect light at unexpected angles which can cause shadings in theimages that can be mistaken for carton interfaces. To overcome thisobstacle, the LED lights can be designed with two different sets oflenses: one lens as a wide 120 degree angle, and the other lens as anarrow 30 degree angle. Images can be acquired using both these lensessuch that features which differ between the two lighting geometries canbe used to remove the artifacts caused by the ceiling.

By way of example, images acquired with a top light and a bottom lightare shown in FIGS. 11 and 12. In FIG. 11, the left hand imageillustratively shows an image of a carton wall illuminated with the toplight bar 651 using only the 30 degree lens and corresponding LED strip,while the left hand image in FIG. 12 shows an image captured of the samecarton wall illuminated with the top light bar 651 using only the 120degree lens and corresponding LED strip. In FIG. 11, the right handimage illustratively shows an image captured of the same carton wallilluminated with the bottom light bar 658 using only the 30 degree lensof the corresponding LED strip, while the right hand image in FIG. 12shows an image captured of the same carton wall illuminated with thebottom light bar 658 using only the 120 degree lens and correspondingLED strip.

Referring to FIGS. 13 and 14, the left hand images (FIGS. 13A & 14A)shows the computed shading topography images using the combinedinformation from each of the images in each of FIGS. 11 and 12. The STItechnique can provide an accurate location of the horizontal and/orvertical positions of the carton interfaces. The STI image data can becombined with binocular stereo image data (FIGS. 13B and 14B) to producethree dimensional location information (e.g., pick coordinates) in thecamera's coordinate space. For example, for a carton wall region definedby the valid carton interfaces found using the STI data, the binocularstereo depth image data can be queried to determine the depth of theinterfaces. Suitable examples of binocular stereo devices, systems, andmethods may be found within U.S. Pat. No. 5,383,013, filed on Sep. 18,1992, and U.S. Pat. No. 7,444,013, filed on Aug. 10, 2001, the contentsof each of which are hereby incorporated by reference, in theirentireties, including but without limitation, those portions directed tostereo image analysis.

Binocular stereo approaches may use image correspondence between imagesfrom multiple cameras that are separated by a known distance to producea depth map of the carton wall. For example, each of two cameras may usea wide-angle lens that causes a significant distortion of their image.The process of correcting the image for this distortion may be includedin the calibration of the system. Calibration of each camera and itswide-angle lens may be preferably performed at the factory, and/or priorto installation of the cameras for use. Calibration of each camera andlens may not require a wall of cartons or the robot itself. In someembodiments, camera and lens calibration for binocular stereo techniquesmay be performed for at least two (e.g., a pair) cameras and lenses intheir enclosure while installed on the robot. The binocular stereocalibration process may use a checker board pattern presented atmultiple angles and locations over the entire field of view of the imagefor each camera, and over 400 images may be acquired to produce accuratecalibration.

Binocular stereo calibration can ensure that the coordinates derivedfrom images are accurate when translated into the coordinate space ofthe binocular stereo vision system. An alignment procedure can berequired to translate these coordinates into the robot coordinate space.The alignment procedure may be required each time the camera enclosurecontaining the at least two cameras are installed on a robot. Thealignment procedure can provide the rotation and translation matricesrequired to accurately translate the vision system coordinate space intothe robots pick coordinates. Initial calibration as part of STI can berequired for any camera with a lens distortion. As previously mentioned,the binocular stereo calibration can be performed offline at a factoryprior to installation of the camera.

Yet, Photometric Stereo approaches can require calibration of thecameras together with the light sources and/or with the actualbackground light of the environment to be navigated. For example,Photometric Stereo calibration can be required to model the variation ofthe greyscale intensity at many locations in the image due to the angleand/or intensity of each light. This variation in grayscale intensityvalues can require in-situ calibration on the robot and/or within thetrailer environment to account for variations in environmental light,such as from multiple reflections of the trailer walls, ceiling, and/orfloors and their respective geometries. The Photometric Stereo (PS)model can then be used to differentiate between reflectance changes andsurface shape (surface normal) changes when imaging an unknown object.

In contrast to Photometric Stereo techniques which require extensivecalibration between cameras and lights, STI does not depend on thecalibration of the vision system. Rather STI can model variation of thepixel intensity due to lighting intensity and/or orientation and/ordistance during operation, in lieu of a separate, in-situ calibrationprocessing. This can be a significant advantage, for example, inindustrial applications.

In some embodiments, the shading topography image may comprise aconvolutional shading topography image developed by a convolutionalneural network. As discussed in additional detail herein, image sets canbe inputted to the convolutional neural network to train the neuralnetwork. The image sets may be inputted according to multiple channelssuch as 3-channels as common red, green, blue visual channels. In theillustrative embodiment, the three channel stacks for image sets caninclude [binocular stereo], [top light angle 1, bottom light angle 2],[top light angle 2, bottom light angle 1]. The channel stacks for [toplight angle 1, bottom light angle 2], [top light angle 2, bottom lightangle 1] may be formed as the pixel difference between top light angle 1and bottom light angle 2, and the pixel difference between top lightangle 2 and bottom light angle 1. In some embodiments, greater or fewerthan 3-channels may be applied, and/or other channel stacks. In someembodiments, the channels and/or channel stacks may be determined bythreshold evaluation of performance for a predetermined set of imagesfor specific environmental conditions, for example, by least errorcompared with a ground truth set of results with the fastestinterference speed.

Referring to FIG. 8, an alignment procedure for STI techniques can allowa maintenance technician to install a new vision system on a robot withease. The procedure may use a Lidar system having two scanning Lidarsensors 871 a, 871 b that can provide accurate depth measurements, forexample, +/−about 0.04 inches (1 mm) for a range of about 78 inches (2m). The Lidar sensors 871 a, 871 b are illustratively mounted on theend-of-arm tool of the robot as suggested in FIG. 8. The preciselocation of the Lidar sensors 871 a, 871 b with respect to the endeffector 8710 of the robot may be known from design and/or can beconfirmed by measurement. Custom alignment targets 876, 879 areillustratively positioned in at least nine positions over the cartonwall.

The targets 876, 879 are illustratively embodied as being formed as athin sheet of plastic having a flange for insertion between cartons ofthe carton wall such that a front surface of the target faces the robot.The front surface of the targets are embodied as about 6 inches (150 mm)tall and protrude from the carton wall by about 1 inch (25 mm) or more.The targets (7.6, 7.9) illustratively a graphic, such as fiducialpattern, that can be detected by the STI vision system. The fiducialpattern can be formed to include a cross-hair, a checker board pattern,and/or a bullseye. For example, a fiducial pattern including a checkerboard pattern may allow precise detection of tilt and/or position of thetarget by the vision system. One suitable example of fiducial patternrecognition which may be applied in the present disclosure is disclosedwithin U.S. Pat. No. 7,039,229, issued on May 2, 2006, the contents ofwhich are incorporated by reference herein, including but withoutlimitation, those portions related to pattern recognition.

During the alignment process the technician may use a special mode ofthe robot to direct the robot to a specific set of locations in whichthe Lidar sensors 871 a, 871 b mounted on the end-of-arm tools arearranged within about 2 inches (50 mm) of an alignment target 876, 879.With this arrangement, the robot (e.g., via its control system, such asa programmable logic controller) may send a signal to the vision systemto initiate recording of the Lidar data. Once this Lidar data has beenrecorded, the vision system may initiate the robot to proceed to thenext target 876, 879 location. After target data from each target 876,879 has been recorded, the robot can be directed to an imaging positionto allow the cameras to acquire images of the carton wall with all ofthe targets 876, 879.

The vision system may detect the targets 876, 879 in the captured imagesand the position of the fiducial patterns on the targets 876, 879.Target and/or fiducial pattern detection may be performed by a processorexecuting software and/or firmware stored on memory, such as a dedicatedmemory device. The vision system may use the Lidar data acquired at eachrobot position to detect the position of each target 876, 879. Eachtarget 876, 879 can be detected with signal processing of the Lidardata. The spacing of about 1 inch (25 mm) between each target 876, 879from the carton wall can produce a clear step in the Lidar signal thatcan be used to automatically detect the location of each target 876, 879within the Lidar data. The distance between the Lidar sensors 871 a, 871b and each target 876, 879 can be recorded for that Lidar signal. Theedges of the step in the Lidar signal for each target 876, 879 can beused to determine the distance between the target, the ceiling, and/orthe floor using the vertically oriented Lidar sensor 871 b. Using thehorizontally oriented Lidar 871 a, the distance between the side walls877 of the trailer and the edges of each target 876, 879 can bedetermined. Since the coordinates of the positions to which the robotwas directed for assessing each target 876, 879, and/or the offsets fromeach target 876, 879 have been measured (e.g., spacing from sidewalls,ceiling, and/or floor), a regression method can be used to determine thethree-dimensional rotation and/or translation matrix in order totranslate the vision system coordinates into robot coordinates.

As shown in FIG. 9, the location of an alignment target 986 in an imagerelative to a wall 988 of the trailer, defines a distance 982. Thelocation of the target 986 relative to the floor 987 of the trailerdefines a distance 981. The distances 981, 982 can determine thehorizontal and vertical location of the target 986 in the vision systemcoordinate space. The binocular stereo image can be used to determinethe depth of the target 986. The distances 981, 982 together with thedepth information of the binocular stereo image can be used to definethe coordinates of the target 986 in the vision system coordinate space.

Referring still to FIG. 9, the vertical lidar scanner is shown as 987and the location of a target according to the vertical lidar scanner 987is shown as 983. The vertical location 983 of the target can beautomatically detected using common signal processing techniques todetect step changes within a one dimensional signal to define thevertical position of the target in the robot coordinate space. Thehorizontal lidar scanner 989 can determine the horizontal location ofthe target as 984. The horizontal location 984 can be automaticallydetected using common signal processing techniques to detect stepchanges within a one dimensional signal, to define the horizontalcoordinate of the target in the robot coordinate space. The depth of thetarget can be measured directly by both the horizontal and verticallidar scanners 987, 989 in the regions of the step change 986, 985.These direct measurements can be compared with each other. If the directmeasurements are consistent with each other, the direct measurements canbe averaged to determine the depth of the target in the robot coordinatespace. A regression between the vision coordinate space values and therobot coordinate space values can be calculated to determine the learnedalignment of the combined system of the collective robot-vision system.The learned alignment can be used to transform the measured visioncoordinate values to robot coordinate space values.

Referring now to FIG. 17, a diagrammatic view of the robotic assembly110 is shown. The robotic assembly 110 includes vision system 126 androbotic system 130. The robotic system 130 includes robotic structures132 (e.g., appendages, supports, wheels, etc.), roboticactuator/operator devices 134 for robotic structure operations (e.g.,sensors, motors, pumps, lubricant/coolant/hydraulic fluid storage,electrical power storage, and related auxiliaries), and may optionallyinclude one or more control systems 136 (e.g., processors, memory, andcommunications circuitries) for conducting robotic structure and/oractuator/operator operations, although in some embodiments, processors,memory, and/or communications circuitry may be partly or wholly sharedwith the vision system 126.

The vision system 126 includes a control system 140 for governing visionsystem operations. The control system 140 includes processor 142, memory144 storing instructions for execution by the processor 142, andcommunications circuitry 146 for communicating signals to and/or fromthe processor 142. The vision system 126 includes cameras 148 and lightsources 150 in communication with the control system 140 for operationas discussed herein. Examples of suitable processors may include one ormore microprocessors, integrated circuits, system-on-a-chips (SoC),among others. Examples of suitable memory, may include one or moreprimary storage and/or non-primary storage (e.g., secondary, tertiary,etc. storage); permanent, semi-permanent, and/or temporary storage;and/or memory storage devices including but not limited to hard drives(e.g., magnetic, solid state), optical discs (e.g., CD-ROM, DVD-ROM),RAM (e.g., DRAM, SRAM, DRDRAM), ROM (e.g., PROM, EPROM, EEPROM, FlashEEPROM), volatile, and/or non-volatile memory; among others.Communication circuitry includes components for facilitating processoroperations, for example, suitable components may include transmitters,receivers, modulators, demodulator, filters, modems, analog to digitalconverters, operational amplifiers, and/or integrated circuits.

The robotic assembly 110 is illustratively embodied to include a neuralnetwork 160 as a portion of the vision system 126. The neural network160 is illustratively embodied as a deep neural network, and morespecifically as a convolutional neural network for developing andexecuting convolution of images to provide a shading topography image asa convolutional shading topography image as a composite of informationof the images. Although, in the illustrative embodiment, the neuralnetwork 160 is shown as a portion of the vision system 126 distinct fromthe controller 142, memory 144, and 146, in some embodiments, the neuralnetwork 160 may be formed with components partly or wholly shared withother components of the vision system 126, other features of the roboticassembly 160 (e.g., robotic system 130, cameras 148, light sources 150),and/or as partly or wholly remote from and in communication with therobotic assembly 110.

In the illustrative embodiment, the neural network 160 is trainedaccording to various image sets obtained by the vision system 126 of therobotic assembly 110. Training the neural network 160 for use inhandling cargo can comprise input and analysis of image sets from tens,hundreds, or thousands (or more) of cargo walls, and may be updated withadditional images sets even after field implementation. Training theneural network 160 can comprise feedback in which the outputs of theneural network 160 are compared with threshold values, and pass/failsratings are returned to the neural network 160 for consideration(learning). In some embodiments, correction information containing theexpected output values may be returned to the neural network. Thresholdvalue comparison may include computer comparison, manual (human)comparison, and/or combinations thereof. For example, a trainingconvolutional shading topography image can be developed by the neuralnetwork 160 based on the inputted training image set(s) and the trainingconvolutional shading topography image can applied to develop trainingpick coordinates. The training pick coordinates can then be comparedwith actual pick coordinates of the cartons in the training images andthreshold values can be applied to determine whether the training pickcoordinates are within acceptable range of the actual pick coordinates.The neural network 160 can be deemed to be a trained neural network 160by achieving threshold accuracy in providing pick coordinates over athreshold number of groups of images sample sets. Threshold accuracy inpick coordinates for individual training image sets and/or groups ofimages sample sets may be application specific, and may vary based onenvironmental conditions including expected cargo variability (size,material, color, position, etc.). In some embodiments, trained neuralnetwork 160 may include neural networks which originate from proprietaryconfigurations (e.g., pre-configured neural networks), which arethereafter further trained according to application specific trainingsets.

Trained neural network 160 can be deployed for use with robotic assembly110 to handle cargo. Active (non-training) images sets can be providedfrom the vision system 126 to develop the convolutional shadingtopography image by the trained neural network 160. Actual pickcoordinates (non-training) can be determined based on the convolutionalshading topography image. In the illustrative embodiment, the neuralnetwork 160 determines the pick coordinates from the convolutionalshading topography image, partly or wholly by the neural network 160itself, and/or in conjunction with other systems apart from the roboticassembly 110. The convolutional shading topography image isillustratively formed partly on the basis of the channel stack includingthe binocular stereo image data, within one or more filter layers of theneural network 160. The filter layers may be embodied as convolutionalkernels, which may each comprise a predetermined matrix of integers foruse on a subset of input pixel values of the capture images. The kernelsmay be selected by the convolutional neural network itself. In someembodiments, the convolutional shading topography image may be formed inone or more filter layers independent from the binocular stereo imagedata, and may be combined with the binocular stereo image data inanother one or more filter layers of the neural network 160. Determinedpick coordinates can be communicate with the robotic assembly 110 foruse in handling the cargo. In some embodiments, the neural network 160provides the convolutional shading topography image to the roboticassembly 110 to determine the pick coordinates. In some embodiments, theconvolutional shading topography image may be derived from the neuralnetwork processing of multiple image channels which each comprise apredetermined composite stack of information of a plurality of imagescaptured by the at least one camera.

Within the devices, systems, and methods of the present disclosure, atleast two cameras of the vision system can determine the visioncoordinate space. Other elements of the vision system can be replacedand/or installed without affecting the alignment of the camera pair, andthus, the aligned coordinate space. Exemplary instances which mayrequire alignment include (1) after initial installation; (2) afterremoval and/or re-installation of the camera enclosure, for example, forrobot maintenance; (3) under replacement of a camera and/or the cameraenclosure; and/or (4) as a periodic check to account for drift and/oraccidental movement, for example, due to collision and/or shock.

Devices, systems, and method of the present disclosure may be adapted toconsider shading and/or shadows caused by depth variation. For example,when cartons of a carton wall are not at the same depth as each other(viewed from the front), shading may be created, for example, as ashadow from one of the cartons being cast on another carton. The shadowcharacteristics can depend on the position of one or more of the cartons(casting or receiving) relative to the light sources. FIG. 15 shows adepiction in which adjacent cartons (A, B) vary in the depth of theirface (A₁, B₁) along the z axis (vertical in FIG. 14). The carton B isillustratively arranged closer to the light source 151 along the x-axisand is also closer to the camera along the z axis. From the light source151, carton B casts a shadow onto the carton A, and the region ofshading at the interface between the cartons (A, B) is larger thanexpected than if there were no depth variation (i.e., if the faces A₁,B₁ were each arranged at the same position along the z axis). The shadowcan be analyzed similar to the STI signal and used to detect theinterface of the cartons (A, B).

FIG. 16 shows the carton B that closer to the light along the x axis andis further from the light along the z-axis compared to carton A. Thedrop in the image intensity at the edge of the cartons (A, B) isfollowed by an increase in the reflection from the side wall of thecarton next to it, such that there is no shadow from depth variation inthis case. Accordingly, the arrangement of the cartons (A, B) can bedetermined.

Illustrative embodiments have included disclosure concerning trailershaving structure which imposes little or no impact on the imaging.However, devices, systems, and methods within the present disclosure maybe adapted to accommodate variation of imaging due to trailer type andcarton wall position within trailer. For example, trailer types mayinclude SEV 98″ wide+4/−0″ and 126″ tall+0/−2″ and SF 102″ wide, +0/−2″and 110″ tall+1/−2″. The trailer type can affect the size of the wall ofcartons within the trailer, for example, when fully loaded. In onesuitable example, the carton wall may be positioned into the body oftrailer by a distance of about 64 inches (1625 mm) from the base of therobot (J1).

The transition from a loading dock to the trailer can be affected by therelative height of the trailer compared with the loading dock. Forexample, the floor of some trailers can be lower or higher than thefloor of the loading dock. Dock levelers can be used to transition frombetween docks and trailers having floors of different heights. Forexample, a dock lever may include a ramp between the floors of theloading dock and the trailer. While the robot, or a portion of the robotis position on the dock leveler, cameras and/or light source mounted tothe robot may be arranged with tilt because the robot is not on levelsupport, for example, the cameras and/or light sources may be tiltedwith respect to the wall of cartons within the trailer. By way ofexample, dock levelers may create angles with respect to entry of thetrailer within the range of about −5 to about 5 degrees, although insome instances, lesser or greater angles may also be created.

Such tilt can cause a systematic depth variation that can affect thelighting profile over the carton wall. The STI approach may be optimizedto account for this systematic depth variation. Due to the tilt of thecamera, the cartons may appear in a lower (or higher) portion of theimage than when the robot is positioned within the body of the trailer.This lower (or higher) portion of the camera image can have lesserresolution than the center of the image for well-known lens geometryreasons. The ST algorithm may need to be optimized in order to obtainthe required robot picking accuracy for this region. In addition, sincethe camera is centrally located, the tilt of the robot can increase theocclusion of one carton by another carton above or below it. Thisocclusion can cause the size of the carton to appear to be smaller whichneeds to be accommodated by the logic that determines the final cartonlocations. The overall tilt of the carton wall must also be factoredinto the transformation of the carton locations in the vision systemcoordinate to the robot coordinate. The distance from the camera(s) tothe carton wall may be about 72 inches (1825 mm) when the robot is atleast partly on dock leveler (this includes additional distance beyondthe 64 inches mentioned above, as the angle imposed by the dock levelercan alter the field of view). The position of the robot being partly orwholly on the dock leveler can be determined based on the robot distancerelative to the trailer, robot sensor data (e.g., angular sensor),and/or the detection of the perceived tilt of the carton wall relativeto the robot. Distance into the trailer may be determined based on acombined length of extendable, string encoder, and/or fixed vehiclelength.

Carton stacking variations, and/or movements of the cartons within thetrailer, for example, during transports, and/or other factors can causethe depth of the face of cartons of a carton wall to vary, for example,within a range of +/−20 inches (500 mm) from the average depth of acarton wall. If the depth of a carton is detected to be larger than thisallowed variation, the carton may be determined to be part of a cartonwall that is positioned behind the current carton wall that is beingpicked. A complete carton wall can have cartons from floor to ceiling,and can have cartons from side wall to side wall. A carton wall iscomplete if no more than one carton is missing from the top of anycolumn. A depth image can be used to detect partial carton walls, e.g.,carton walls which are missing more than one carton, for example, fromthe top of one of the columns of the carton wall. The information aboutthe type of wall can be communicated to the robot.

The present disclosure includes step and/or backwall detection. Sometrailers may have a step within their body, closer to the front of thetrailer (away from the rear gate). This step may need to be detected bythe vision system to avoid inappropriate treatment by the robot. STI canbe used to detect the step by inspecting for an absence of edges thatare found between cartons. The distance within the trailer (e.g.,distance from the rear gate) at which a step would be expected isgenerally known according to the trailer size (length), and the knowndistance that the robot travels into the trailer (e.g., distancetravelled from the rear gate) can be considered in comparison to theexpected distance generally known for the step, to assist the visionsystem in detecting the presence of a step. Similarly, detection of thebackwall of the trailer can be performed based on the distance traveledinto the trailer and/or the lack of STI edges on the back wall.

Notably, trailers can have structural ribs formed on their ceilings, andthese ceiling ribs can affect the geometry of the lighting, for example,by causing multiple reflections of the light, which can cause shadowartifacts on the surface of the wall of cartons. The two lights of thelight sources with different lighting angles at each top and bottomlocations can reflect differently from the ceiling ribs. Comparingimages captured under the different light sources can allow for thedetection of the presence of ribs that can then be used to identify thelight source that minimizes the effect on the ST imaging.

Once carton edges in the ST image are analyzed and the most likelycandidates are identified as carton interfaces, the distance between theidentified edges can be used to determine carton size at each locationin a carton wall. If the carton sizes are limited to known sizes thatoccur in the supply chain in a business, then the sizes of the cartonsat each location can be checked against this list of possibilities. Ifthe measured carton is close to a known size, then the location of theinterface can be updated depending on a confidence score of theidentified edge. If the confidence score is relatively high and thecarton is measured to be smaller than expected from the known cartonsize, then the identified edge may not modified. For example, cartonscan be damaged and/or appear to be smaller when they wear out, such thata high confidence score can assist in determining whether to disregardunexpected information differences.

In other instances, the identified edge may updated, for example, thelocation of the identified edge may be corrected according to the knownsizes of cartons. The distances between the edges can used to re-computethe carton sizes. This process can be iteratively repeated until thedifference between measured and expected carton sizes is within anacceptable error range, for example, about +/−0.25 inches (10 mm). Insome embodiments, the acceptable error can be determined according to anoffset that the robot can tolerate before it misses a pick on a carton,for example, about +/−7 inches.

This approach can use business information to logically constrain thelocation of the observed cartons, according to known parameters, such asknown carton dimensions. This business logic approach can be used duringthe identification of the STI edges by restricting the edgeidentification approach to locations where a carton interface isexpected. Business logic can be used to correct misidentification of asplit or crack in a partly opened carton, as a false edge, for example,by disregarding the false edge. Business logic can be used to correctthe misidentification of an elongated carton (merged carton) byintroducing an edge where one was not found by image interpretation.Correction of misidentification of an elongated carton can include amore aggressive search for edges in the region in which the expectededge was not identified in the STI, for example, by affording moreweight to light intensity changes in the region in which the expectededge was not identified but would have been expected according to knowncarton dimensions.

As shown in FIG. 7, a process may include positioning the robot,acquiring one or more images, and determining coordinates. The robot maybe arranged in position by approaching a carton wall to obtain animaging position. The robot may verify its positioning by any suitablemethod, for example, measuring distance to known reference elements.Upon desirable positioning, the process may proceed to image capture(acquisition). During image capture the cameras may capture variousimages. The various images may include applying only a top light sourceand capturing one or more images with each of the two cameras. Thevarious images may include applying only a bottom light source andcapturing one or more images which each of the two cameras. An STI imagemay be computed and a depth image may be captured and/or computed. Casecoordinate data may be determined based on a combination of the STIimage and depth image. The process may proceed to compute pickcoordinates.

Based on the case coordinate data, a determination may be made as to astack type. SKU transitions, indicating the sizes of cases or carton,made be determined within the case coordinate data. If SKU transitionscannot be determined, a determination of incorrect SKU may be determinedand a correction to the SKU may be applied and/or the SKU may be removedfrom consideration. If determined SKU transitions indicatecorrespondence with known sizes of cases or carton, the determined SKUtransitions may be output for robot operation in picking.

While certain illustrative embodiments have been described in detail inthe figures and the foregoing description, such an illustration anddescription is to be considered as exemplary and not restrictive incharacter, it being understood that only illustrative embodiments havebeen shown and described and that all changes and modifications thatcome within the spirit of the disclosure are desired to be protected.There are a plurality of advantages of the present disclosure arisingfrom the various features of the methods, systems, and articlesdescribed herein. It will be noted that alternative embodiments of themethods, systems, and articles of the present disclosure may not includeall of the features described yet still benefit from at least some ofthe advantages of such features. Those of ordinary skill in the art mayreadily devise their own implementations of the methods, systems, andarticles that incorporate one or more of the features of the presentdisclosure.

We claim:
 1. A robotic assembly for handling cargo, the assembly comprising: at least one robotic appendage for handling cargo; and a vision system for acquiring information to assist operation of the at least one robotic appendage, the vision system comprising: at least two light sources arranged spaced apart from each other; at least one camera configured to capture images of cargo for handling by the robotic assembly; and a vision control system for determining position of cargo, the vision control system including at least one processor for executing instructions stored on a memory to determine position of cargo based on a shading topography image comprising a blended composite of information of a plurality of images captured by the at least one camera.
 2. The robotic assembly of claim 1, wherein the shading topography image comprises a convolutional shading topography image comprising a convolution of the plurality of images captured by the at least one camera.
 3. The robotic assembly of claim 2, wherein the convolutional shading topography image is generated by a neural network.
 4. The robotic assembly of claim 2, wherein the convolution of the plurality of images comprises convolution of predefined channels of images captured by the at least one camera.
 5. The robotic assembly of claim 4, wherein one of the predefined channels comprises image data captured under illumination by at least one, but fewer than all, of the at least two light sources, and other image data captured with illumination by at least another one, but fewer than all, of the at least two light sources.
 6. The robotic assembly of claim 5, wherein the image data captured under illumination by at least one, but fewer than all, of the at least two light sources comprises image data captured under illumination by one of the at least two light sources having a first upper directional lighting trajectory and the other image data captured with illumination by at least another one, but fewer than all, of the at least two light sources comprises image data captured under illumination by another one of the at least two light sources having a first lower directional light trajectory.
 7. The robotic assembly of claim 6, wherein another one of the predefined channels comprises image data captured under illumination by at least one, but fewer than all, of the at least two light sources having a second upper directional light trajectory, different from the first upper directional light trajectory, and other image data captured with illumination by at least another one, but fewer than all, of the at least two light sources having a second lower directional light trajectory, different from the first lower directional light trajectory.
 8. The robotic assembly of claim 4, wherein at least one of the predefined channels comprises image data captured with illumination by greater than one of the at least two light sources.
 9. The robotic assembly of claim 4, wherein the image data captured with illumination by greater than one of the at least two light sources includes image data captured under illumination by first and second directional light trajectories for each of the at least two light sources.
 10. The robotic assembly of claim 1, wherein the at least one camera is arranged between two of the at least two light sources.
 11. The robotic assembly of claim 10, wherein the at least one camera is configured to capture at least one image of cargo under illumination by at least one of the at least two light sources and at least another image of cargo under illumination by another one of the at least two light sources.
 12. The robotic assembly of claim 11, wherein configuration of the at least one camera to capture the at least one image includes configuration to capture the at least one image under illumination by at least one of the at least two light sources without illumination by another of the at least two light sources, and configuration of the at least one camera to capture the at least another image includes configuration to capture the at least one another image under illumination by at least the another of the at least two light sources without illumination by at least one of the at least one of the at least two light sources.
 13. The robotic assembly of claim 1, wherein the at least one camera is coupled with a robotic unloading machine comprising the at least one robotic appendage.
 14. The robotic assembly of claim 13, wherein at least one of the at least two light sources is coupled with the robotic unloading machine.
 15. The robotic assembly of claim 1, wherein at least one of the at least two light sources includes a first light having a first directional lighting trajectory and a second light having a second light trajectory.
 16. The robotic assembly of claim 1, wherein the vision control system is adapted to conduct an imaging sequence including communicating with the at least one camera to capture one or more images of a wall of cargo under a predetermined illumination scheme of the at least two light sources.
 17. The robotic assembly of claim 16, wherein the predetermined illumination scheme includes one or more images having none of the at least two light sources illuminated, one or more images having all of the at least two light sources illuminated, and one or more images having fewer than all of the at least two light sources illuminated.
 18. The robotic assembly of claim 17, wherein the one or more images having fewer than all of the at least two light sources illuminated includes at least one image under illumination by only a first light of one of the at least two light sources having a first directional lighting trajectory.
 19. The robotic assembly of claim 18, wherein the one or more images having fewer than all of the at least two light sources illuminated includes at least one image under illumination by only a second light of the one of the at least two light sources having a second directional lighting trajectory, different from the first directional lighting trajectory.
 20. The robotic assembly of claim 1, wherein the shading topography image comprises an expression of absolute value of a gradient sum of intensity values of a number of images of cargo acquired by the at least one camera.
 21. A vision system of a robotic assembly for handling cargo, the vision system comprising: at least two of light sources arranged spaced apart from each other; at least one camera configured to capture images of cargo for handling by the robotic assembly; and a vision control system for determining position of cargo, the vision control system including at least one processor for executing instructions stored on a memory to determine position of cargo based on a shading topography image comprising a blended composite of information of a plurality of images captured by the at least one camera. 